Compiling large language resources using lexical similarity metrics for domain taxonomy learning

نویسندگان

  • Ronny Melz
  • Pum-Mo Ryu
  • Key-Sun Choi
چکیده

In this contribution we present a new methodology to compile large language resources for domain-specific taxonomy learning. We describe the necessary stages to deal with the rich morphology of an agglutinative language, i.e. Korean, and point out a second order machine learning algorithm to unveil term similarity from a given raw text corpus. The language resource compilation described is part of a fully automatic top-down approach to construct taxonomies, without involving the human efforts which are usually required.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

The Relationship between Iranian Upper-Intermediate EFL Learners’ Contrastive Lexical Competence and Their Use of Vocabulary Learning Strategies

Regarding the vital role of lexical competence as an important requisite for the attainment of full mastery of the four language skills, this study tried to investigate the relationship between Iranian EFL learners’ contrastive lexical competence and their use of vocabulary learning strategies. To fulfil this objective, 60 Iranian upper-intermediate male and female language learners were select...

متن کامل

Data-driven Natural Language Generation: Paving the Road to Success

We argue that there are currently two major bottlenecks to the commercial use of statistical machine learning approaches for natural language generation (NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b) The scarcity of high quality in-domain corpora. We address the first problem by thoroughly analysing current evaluation metrics and motivating the need for a new, mor...

متن کامل

The Study and Review of Paraphrase Detection Techniques in Machine Learning

ABSTARCT: Paraphrase is a process of computing the semantic similarity between sentences, which are not lexicographically similar. Though a number of metrics for English language have been proposed in literature, to quantify textual similarity; it addresses the problem for detection of monolingual text-text lexical similarity. Existing system for Indian Language paraphrase detection uses lexica...

متن کامل

Measuring Semantic Textual Similarity of Sentences Using Modified Information Content and Lexical Taxonomy

In this paper, we present a survey and comparative studies on semantic textual similarity methods, those are based on WordNet taxonomy. We also proposed a new method for measuring semantic similarity between sentences. This proposed method, uses the advantages of taxonomy methods and merge these information to a language model. It considers the WordNet synsets for lexical relationships between ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006